Neural Networks and Spelling Features for Native Language Identification

نویسندگان

  • Johannes Bjerva
  • Gintare Grigonyte
  • Robert Östling
  • Barbara Plank
چکیده

We present the RUG-SU team’s submission at the Native Language Identification Shared Task 2017. We combine several approaches into an ensemble, based on spelling error features, a simple neural network using word representations, a deep residual network using word and character features, and a system based on a recurrent neural network. Our best system is an ensemble of neural networks, reaching an F1 score of 0.8323. Although our system is not the highest ranking one, we do outperform the baseline by far.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Native Language Identification by Using Spelling Errors

In this paper, we explore spelling errors as a source of information for detecting the native language of a writer, a previously under-explored area. We note that character n-grams from misspelled words are very indicative of the native language of the author. In combination with other lexical features, spelling error features lead to 1.2% improvement in accuracy on classifying texts in the TOE...

متن کامل

Cognate and Misspelling Features for Natural Language Identification

We apply Support Vector Machines to differentiate between 11 native languages in the 2013 Native Language Identification Shared Task. We expand a set of common language identification features to include cognate interference and spelling mistakes. Our best results are obtained with a classifier which includes both the cognate and the misspelling features, as well as word unigrams, word bigrams,...

متن کامل

Aircraft Visual Identification by Neural Networks

In the present paper, an efficient method for three dimensional aircraft pattern recognition is introduced. In this method, a set of simple area based features extracted from silhouette of aerial vehicles are used to recognize an aircraft type from its optical or infrared images taken by a CCD camera or a FLIR sensor. These images can be taken from any direction and distance relative to the fly...

متن کامل

Identification of Houseplants Using Neuro-vision Based Multi-stage Classification System

In this paper, we present a machine vision system that was developed on the basis of neural networks to identify twelve houseplants. Image processing system was used to extract 41 features of color, texture and shape from the images taken from front and back of the leaves. The features were fed into the neural network system as the recognition criteria and inputs. Multilayer perceptron (MLP) ne...

متن کامل

LIMSI's participation to the 2013 shared task on Native Language Identification

This paper describes LIMSI’s participation to the first shared task on Native Language Identification. Our submission uses a Maximum Entropy classifier, using as features character and chunk n-grams, spelling and grammatical mistakes, and lexical preferences. Performance was slightly improved by using a twostep classifier to better distinguish otherwise easily confused native languages.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017